14 research outputs found
What Your Username Says About You
Usernames are ubiquitous on the Internet, and they are often suggestive of
user demographics. This work looks at the degree to which gender and language
can be inferred from a username alone by making use of unsupervised morphology
induction to decompose usernames into sub-units. Experimental results on the
two tasks demonstrate the effectiveness of the proposed morphological features
compared to a character n-gram baseline
Talking to the crowd: What do people react to in online discussions?
This paper addresses the question of how language use affects community
reaction to comments in online discussion forums, and the relative importance
of the message vs. the messenger. A new comment ranking task is proposed based
on community annotated karma in Reddit discussions, which controls for topic
and timing of comments. Experimental work with discussion threads from six
subreddits shows that the importance of different types of language features
varies with the community of interest
Hierarchical Character-Word Models for Language Identification
Social media messages' brevity and unconventional spelling pose a challenge
to language identification. We introduce a hierarchical model that learns
character and contextualized word-level representations for language
identification. Our method performs well against strong base- lines, and can
also reveal code-switching